Skip to content

Dynamic Loading IB verbs#253

Draft
AtlantaPepsi wants to merge 2 commits intoROCm:candidatefrom
AtlantaPepsi:ibdl
Draft

Dynamic Loading IB verbs#253
AtlantaPepsi wants to merge 2 commits intoROCm:candidatefrom
AtlantaPepsi:ibdl

Conversation

@AtlantaPepsi
Copy link
Copy Markdown
Contributor

@AtlantaPepsi AtlantaPepsi commented Apr 8, 2026

Motivation

  • Add option to dynamically link IB verbs to TransferBench
  • minor compilation fixes and new macro to separate cuMem from pod communication

Technical Details

Created a separate IbvDynload.hpp header in src for the purpose of dynamic loading of ibverbs, which original TransferBench.hpp would depend on.

Test Plan

Tested all combination of NIC IBV NVML POD MPI enablement and confirmed no compilation or linking error.

Test Result

Submission Checklist

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an IBV_DIRECT build/runtime switch intended to support resolving libibverbs symbols either via direct linkage or via dlopen/dlsym, and adds a runtime check to block NIC executor usage when verbs aren’t available.

Changes:

  • Added IBV_DIRECT mode plumbing (Makefile + CMake option) to toggle direct symbol binding vs dlsym resolution.
  • Introduced pfn_* function pointers and updated some verbs call sites/macros to call through them.
  • Added System::IbvLoaded() and a guard to error out when NIC executor is requested but verbs aren’t loaded.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
src/header/TransferBench.hpp Adds libibverbs dynamic-loading infrastructure (pfn_*, Ibvdl()), runtime loaded-state tracking, and uses pfn_* in some verbs calls.
Makefile Adds DISABLE_IBV_DIRECT flag and sets -DIBV_DIRECT=1 by default when NIC exec is enabled.
CMakeLists.txt Adds ENABLE_IBV_DIRECT option and defines IBV_DIRECT=1 when enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/header/TransferBench.hpp Outdated
Comment thread src/header/TransferBench.hpp Outdated
Comment thread src/header/TransferBench.hpp Outdated
Comment thread src/header/TransferBench.hpp Outdated
Comment thread src/header/TransferBench.hpp Outdated
Comment thread Makefile
Comment on lines 100 to +103
LDFLAGS += -libverbs
NIC_ENABLED = 1
ifneq ($(DISABLE_IBV_DIRECT),1)
COMMON_FLAGS += -DIBV_DIRECT=1
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When DISABLE_IBV_DIRECT=1 (dlsym path), the build still unconditionally links -libverbs. Any remaining direct ibv_* calls (and even an unused -libverbs without --as-needed) can keep libibverbs as a hard runtime dependency, undermining the goal of optional runtime loading. Consider omitting -libverbs when using the dlsym path (and ensure all ibv_* usage goes through pfn_*), and add -ldl for toolchains/glibc versions where dlopen is not in libc.

Suggested change
LDFLAGS += -libverbs
NIC_ENABLED = 1
ifneq ($(DISABLE_IBV_DIRECT),1)
COMMON_FLAGS += -DIBV_DIRECT=1
NIC_ENABLED = 1
ifneq ($(DISABLE_IBV_DIRECT),1)
COMMON_FLAGS += -DIBV_DIRECT=1
LDFLAGS += -libverbs
else
LDFLAGS += -ldl

Copilot uses AI. Check for mistakes.
Comment thread CMakeLists.txt
Comment on lines 325 to +327
target_link_libraries(TransferBench PRIVATE ${IBVERBS_LIBRARY})
target_compile_definitions(TransferBench PRIVATE NIC_EXEC_ENABLED)
if(ENABLE_IBV_DIRECT)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENABLE_IBV_DIRECT=OFF claims to resolve symbols via dlsym at runtime, but the target still links against ${IBVERBS_LIBRARY} unconditionally when IBVERBS_FOUND. This keeps libibverbs as a hard dependency and defeats optional loading. If runtime optionality is intended, avoid linking ${IBVERBS_LIBRARY} when ENABLE_IBV_DIRECT is OFF and ensure all ibv_* usage is through the loaded function pointers.

Suggested change
target_link_libraries(TransferBench PRIVATE ${IBVERBS_LIBRARY})
target_compile_definitions(TransferBench PRIVATE NIC_EXEC_ENABLED)
if(ENABLE_IBV_DIRECT)
target_compile_definitions(TransferBench PRIVATE NIC_EXEC_ENABLED)
if(ENABLE_IBV_DIRECT)
target_link_libraries(TransferBench PRIVATE ${IBVERBS_LIBRARY})

Copilot uses AI. Check for mistakes.
Comment on lines 2880 to 2883
// Query the number of IBV devices
int numIbvDevices = 0;
ibv_device** deviceList = ibv_get_device_list(&numIbvDevices);
ibv_device** deviceList = pfn_ibv_get_device_list(&numIbvDevices);

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within GetIbvDeviceList(), most verbs calls were converted to pfn_* (e.g., pfn_ibv_get_device_list/pfn_ibv_open_device/etc.), but the function still calls ibv_free_device_list(deviceList) directly later in the same function. In the dlsym path this reintroduces a direct libibverbs symbol reference and breaks the “all calls through pfn_*” model. Switch that cleanup call to pfn_ibv_free_device_list as well.

Copilot uses AI. Check for mistakes.
Comment thread src/header/TransferBench.hpp Outdated
@AtlantaPepsi AtlantaPepsi changed the base branch from candidate to merge/TransferBench-v1.67.0 April 27, 2026 06:33
@AtlantaPepsi AtlantaPepsi marked this pull request as ready for review April 27, 2026 16:02
@AtlantaPepsi AtlantaPepsi requested review from a team as code owners April 27, 2026 16:02
@nileshnegi
Copy link
Copy Markdown
Collaborator

i'd say just merge into candidate for now.
@gilbertlee-amd mentioned we should get the other pod-presets merged into candidate as well, before promoting to develop

@AtlantaPepsi AtlantaPepsi changed the base branch from merge/TransferBench-v1.67.0 to candidate April 27, 2026 16:19
@AtlantaPepsi AtlantaPepsi marked this pull request as draft May 6, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants